MEPS_IBTS_all_min_100

Andrew Edwards

Combining the four MEPS_IBTS_ vignettes into one to repeat the analyses

Here we combine the essential elements of the four vignettes

to repeat the analyses with an alternative value for the minimum body size (100 g, as opposed to the original 4 g), as suggested by a reviewer (see Supplementary Material). We retain only the necessary details of the four vignettes; the four vignettes should be used for initial understanding, this one is to repeat the analysis with alternate assumptions (or data). We simplify much of the text here as well, to focus on the code.

Pre-processing of IBTS data to get desired data for MEPS Table 1

This is simplified from MEPS_IBTS_1.html – see that vignette for full explanations.

library(sizeSpectra)

Use the saved original data set:

dim(dataOrig)
#> [1] 178435     13
names(dataOrig)
#>  [1] "AphiaID"              "Survey"               "Year"                
#>  [4] "Quarter"              "Area"                 "Species"             
#>  [7] "LngtClas"             "CPUE_number_per_hour" "Taxonomic.group"     
#> [10] "a"                    "b"                    "weight_g"            
#> [13] "CPUE_bio_per_hour"

Keep the desired columns, noting that data will change a lot in the following code and will end up in a standard format:

numAreas = length(unique(dataOrig$Area))
colsKeep = c("Year",
             "AphiaID",
             "LngtClas",
             "CPUE_number_per_hour",
             "a",
             "b",
             "weight_g",
             "CPUE_bio_per_hour")
colsDiscard = setdiff(names(dataOrig), colsKeep)

data = sizeSpectra::s_select(dataOrig, colsKeep)   # uses Sebastian Kranz's s_dplyr_funcs.r
  1. Rename the columns:
if(sum( colsKeep != c("Year", "AphiaID", "LngtClas", "CPUE_number_per_hour",
    "a", "b", "weight_g", "CPUE_bio_per_hour")) > 0)
       { stop("Need to adjust renaming") }
names(data) = c("Year", "SpecCode", "LngtClass", "Number", "LWa", "LWb",
         "bodyMass", "CPUE_bio_per_hour")
# CPUE_bio_per_hour is Number * bodyMass
  1. Make cm not mm:
data$LngtClass = data$LngtClass/10
  1. Rearrange the order to be more intuitive:
data = dplyr::arrange(data, Year, SpecCode, LngtClass)

Aggregate multiple counts of the same length fish of the same species in the same year:

data = dplyr::summarise(dplyr::group_by(data,
                                        Year,
                                        SpecCode,
                                        LngtClass),
                        "Number" = sum(Number)/numAreas,
                        "LWa" = unique(LWa),
                        "LWb" = unique(LWb),
                        "bodyMass" = unique(bodyMass))

Number is the average number (of each species and length) caught per hour of trawling across all seven areas.

Up until here data is the same as it was in MEPS_IBTS_1.html.

Now, instead of only including body-mass classes above 4 g (which we did following Blanchard et al., 2005), we set the cut-off to be 100 g to see if the larger fish are fit well with the bounded power-law fit only to the larger fish:

range(data$LngtClass)
#> [1]   0 150
range(data$bodyMass)
#> [1]     0.00 35630.04
sum(data$bodyMass == 0)
#> [1] 2549
sum(data$bodyMass < 100 )
#> [1] 21781
data = dplyr::filter(data, bodyMass >= 100)
range(data$bodyMass)
#> [1]   100.00 35630.04
data
#> # A tibble: 27,410 x 7
#> # Groups:   Year, SpecCode [1,423]
#>     Year SpecCode LngtClass  Number    LWa   LWb bodyMass
#>    <int>    <int>     <dbl>   <dbl>  <dbl> <dbl>    <dbl>
#>  1  1986   105814        45 0.00714 0.0031  3.03     315.
#>  2  1986   105814        46 0.00714 0.0031  3.03     337.
#>  3  1986   105814        50 0.00714 0.0031  3.03     434.
#>  4  1986   105814        52 0.0293  0.0031  3.03     489.
#>  5  1986   105814        53 0.0109  0.0031  3.03     518.
#>  6  1986   105814        54 0.0113  0.0031  3.03     548.
#>  7  1986   105814        56 0.0218  0.0031  3.03     612.
#>  8  1986   105814        57 0.0188  0.0031  3.03     646.
#>  9  1986   105814        58 0.0381  0.0031  3.03     680.
#> 10  1986   105814        59 0.0327  0.0031  3.03     717.
#> # ... with 27,400 more rows
summary(data)
#>       Year         SpecCode        LngtClass          Number         
#>  Min.   :1986   Min.   :101170   Min.   :  9.00   Min.   :  0.00058  
#>  1st Qu.:1993   1st Qu.:126436   1st Qu.: 32.00   1st Qu.:  0.00866  
#>  Median :2001   Median :126441   Median : 42.00   Median :  0.02597  
#>  Mean   :2001   Mean   :122182   Mean   : 47.96   Mean   :  1.18971  
#>  3rd Qu.:2009   3rd Qu.:127138   3rd Qu.: 60.00   3rd Qu.:  0.11626  
#>  Max.   :2015   Max.   :271564   Max.   :150.00   Max.   :265.51052  
#>       LWa                LWb           bodyMass      
#>  Min.   :0.000300   Min.   :1.797   Min.   :  100.0  
#>  1st Qu.:0.003300   1st Qu.:3.095   1st Qu.:  264.6  
#>  Median :0.004200   Median :3.175   Median :  609.5  
#>  Mean   :0.008075   Mean   :3.155   Mean   : 1629.8  
#>  3rd Qu.:0.005600   3rd Qu.:3.243   3rd Qu.: 1581.0  
#>  Max.   :0.235000   Max.   :3.475   Max.   :35630.0

Total number of fish in this dataset is

sum(data$Number)
#> [1] 32609.96

The unique length classes are:

sort(unique(data$LngtClass))
#>   [1]   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25
#>  [18]  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42
#>  [35]  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59
#>  [52]  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76
#>  [69]  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93
#>  [86]  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108 109 110
#> [103] 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
#> [120] 128 129 130 131 132 133 134 135 138 139 140 142 144 145 146 149 150

Need this to stop earlier groups being kept (can mess up later code):

data = dplyr::ungroup(data)

The equivalent Table 1 (first six and last six rows of the data) is now as follows, where the last six rows of the original Table 1 are no longer in the data as they had body masses $<$100 g:

data_biomass <- dplyr::mutate(data,
                              Biomass = Number * bodyMass)
knitr::kable(rbind(data_biomass[1:6,],
                   data_biomass[(nrow(data_biomass)-5):nrow(data_biomass),
                                     ]),
             digits=c(0, 0, 0, 3, 4, 4, 2, 2))
Year SpecCode LngtClass Number LWa LWb bodyMass Biomass
1986 105814 45 0.007 0.0031 3.029 315.46 2.25
1986 105814 46 0.007 0.0031 3.029 337.17 2.41
1986 105814 50 0.007 0.0031 3.029 434.05 3.10
1986 105814 52 0.029 0.0031 3.029 488.81 14.33
1986 105814 53 0.011 0.0031 3.029 517.84 5.65
1986 105814 54 0.011 0.0031 3.029 548.00 6.18
2015 150637 39 0.234 0.0034 3.260 522.82 122.17
2015 150637 40 0.103 0.0034 3.260 567.80 58.60
2015 150637 41 0.059 0.0034 3.260 615.40 36.01
2015 150637 42 0.007 0.0034 3.260 665.69 4.42
2015 150637 43 0.017 0.0034 3.260 718.76 11.94
2015 150637 44 0.010 0.0034 3.260 774.70 7.72

Analyses of data using the original eight methods

This is simplified from MEPS_IBTS_2.html – see that vignette for full explanations.

See if the number of length classes or species seems to change over time:

dataSumm = dplyr::summarise(dplyr::group_by(data, Year),
                            uniqLngtClass = length(unique(LngtClass)),
                            uniqSpec = length(unique(SpecCode)))
par(mfrow=c(2,1)) #7,1))

plot(dataSumm$Year, dataSumm$uniqLngtClass, xlab="Year",
     ylab="No. unique length classes", type="o",
     ylim=c(0, max(dataSumm$uniqLngtClass)))
plot(dataSumm$Year, dataSumm$uniqSpec, xlab="Year",
     ylab="No. unique species", type="o", ylim=c(0, max(dataSumm$uniqSpec)))

As for the original full data set, there do not look to be any serious issue with this (no drastic changes in, for example, species identification through time).

The remaining eightMethods.count() and plotting code from MEPS_IBTS_2.html is not repeated here since it takes a few hours to run, and our focus is on repeating the MLEbins method.

Analyses of IBTS data using the MLEbins method

This section analyses the IBTS data using the MLEbins method, using the essential calculations from MEPS_IBTS_MLEbins.html.

Determining which rows are 0.5 cm bins

LngtClass for all species is the minimum value of a 1-cm-width bin, except for herring (Clupea harengus) and sprat (Sprattus sprattus) for which lengths are rounded down to 0.5 cm values (so the bins are 0.5-cm wide). The SpecCode values for these are:

Verified earlier that only these two species have 0.5 cm values for LngtClass.

Append the max of the bin breaks for each row

So LngtClass is the minimum of each length bin. Need to work out the maximum of each length bin LengthMax, and then use the species-specific length-weight relationships to give the min (wmin) and max (wmax) of each body-mass bin. So create dataBin table dataframe that has LengthMax, wmin and wmax as extra columns for each row:

No need to re-plot the body-mass bins for each species, since they won’t have changed from the MEPS paper (there will just be no bins below 100 g, and so some species will be omitted).

Likelihood calculations using MLEbins method

Now use the MLEbins method to fit each year of data in turn.

fullYears = sort(unique(dataBin$Year))
# Do a loop for each year, saving all the results in MLEbins.nSeaFung.new
for(iii in 1:length(fullYears))
  {
    dataBinForLike = dplyr::filter(dataBin,
                                   Year == fullYears[iii])
    dataBinForLike = dplyr::select(dataBinForLike,
                                   SpecCode,
                                   wmin,
                                   wmax,
                                   Number)
    n = sum(dataBinForLike$Number)
    xmin = min(dataBinForLike$wmin)
    xmax = max(dataBinForLike$wmax)

    MLEbins.nSeaFung.oneyear.new  = calcLike(negLL.fn = negLL.PLB.bins.species,
                                             p = -1.9,
                                             suppress.warnings = TRUE,
                                             dataBinForLike = dataBinForLike,
                                             n = n,
                                             xmin = xmin,
                                             xmax = xmax)

    if(iii == 1)
    {
      MLEbins.nSeaFung.new = data.frame(Year = fullYears[iii],
                                        xmin=xmin,
                                        xmax=xmax,
                                        n=n,
                                        b=MLEbins.nSeaFung.oneyear.new$MLE,
                                        confMin=MLEbins.nSeaFung.oneyear.new$conf[1],
                                        confMax=MLEbins.nSeaFung.oneyear.new$conf[2])
    } else {
      MLEbins.nSeaFung.new = rbind(MLEbins.nSeaFung.new,
                                   c(fullYears[iii],
                                     xmin,
                                     xmax,
                                     n,
                                     MLEbins.nSeaFung.oneyear.new$MLE,
                                     MLEbins.nSeaFung.oneyear.new$conf[1],
                                     MLEbins.nSeaFung.oneyear.new$conf[2]))
   }
}

# Need the standard error for weighted linear regression,
#  see eightMethods.count() for details:
MLEbins.nSeaFung.new = dplyr::tbl_df(MLEbins.nSeaFung.new)
MLEbins.nSeaFung.new = dplyr::mutate(MLEbins.nSeaFung.new,
                                     stdErr = (abs(confMin-b) +
                                               abs(confMax-b))/(2*1.96) )
MLEbins.nSeaFung.new
#> # A tibble: 30 x 8
#>     Year  xmin   xmax     n     b confMin confMax stdErr
#>    <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>   <dbl>  <dbl>
#>  1  1986  101. 28723.  917. -2.34   -2.43   -2.26 0.0449
#>  2  1987  100  25974.  849. -2.48   -2.58   -2.38 0.0510
#>  3  1988  102. 29440.  983. -2.83   -2.95   -2.72 0.0584
#>  4  1989  102. 35211. 1058. -2.69   -2.79   -2.59 0.0520
#>  5  1990  102. 34811.  830. -2.66   -2.77   -2.55 0.0574
#>  6  1991  100  26413. 1076. -2.82   -2.93   -2.72 0.0556
#>  7  1992  100  25317. 1102. -2.82   -2.93   -2.71 0.0546
#>  8  1993  100. 29440. 1122. -2.47   -2.56   -2.38 0.0441
#>  9  1994  102. 22201.  973. -2.97   -3.10   -2.85 0.0633
#> 10  1995  102. 31819.  937. -2.67   -2.78   -2.57 0.0546
#> # ... with 20 more rows

Now to plot the results and obtain the regression fit (uncomment the commented lines to save an explicit file):

The statistics for the regression fit (like those in Table S.1) are:

Method Low Trend High p Rsquared
0.0021 0.0115 0.021 0.02 0.18

For an equivalent to Table S.2 (results for each year for the MLEbins method), need the constant C for each year, so calculate it here:

Year xmin xmax n confMin b confMax C
1986 101.35 28722.51 916.52 -2.43 -2.34 -2.26 664.97
1987 100.00 25974.17 849.17 -2.58 -2.48 -2.38 1316.49
1988 101.61 29439.75 982.90 -2.95 -2.83 -2.72 8642.27
1989 101.61 35210.99 1058.09 -2.79 -2.69 -2.59 4175.41
1990 101.61 34811.19 830.41 -2.77 -2.66 -2.55 3509.26
1991 100.00 26412.52 1076.15 -2.93 -2.82 -2.72 8047.32
1992 100.00 25316.66 1101.53 -2.93 -2.82 -2.71 7848.38
1993 100.00 29439.75 1121.89 -2.56 -2.47 -2.38 1270.45
1994 101.61 22200.90 972.95 -3.10 -2.97 -2.85 17969.56
1995 101.61 31818.64 936.68 -2.78 -2.67 -2.57 3805.70
1996 101.61 36462.12 1117.99 -2.91 -2.80 -2.70 7503.47
1997 100.00 19360.99 1188.82 -2.83 -2.73 -2.63 5028.09
1998 101.61 22801.53 820.29 -2.68 -2.57 -2.47 2258.31
1999 101.61 36462.12 892.03 -2.62 -2.51 -2.42 1648.97
2000 101.35 22801.53 988.64 -3.09 -2.97 -2.85 17382.96
2001 101.61 18824.87 1927.54 -3.39 -3.28 -3.18 87714.71
2002 100.00 20464.76 1612.89 -2.66 -2.58 -2.50 2272.10
2003 100.00 21611.31 1402.35 -2.48 -2.40 -2.33 893.83
2004 101.61 30911.23 990.74 -2.43 -2.34 -2.26 658.50
2005 100.00 19360.99 1294.30 -2.57 -2.49 -2.41 1418.93
2006 100.00 21032.63 667.38 -2.39 -2.29 -2.19 487.54
2007 100.00 18299.10 1150.19 -2.99 -2.88 -2.77 10584.74
2008 100.00 28722.51 759.07 -2.44 -2.34 -2.25 654.72
2009 100.00 24036.35 732.37 -2.72 -2.60 -2.49 2542.18
2010 101.61 23293.68 1344.75 -2.46 -2.38 -2.31 820.76
2011 101.35 26723.46 1742.68 -2.69 -2.61 -2.54 2738.92
2012 100.00 14899.37 1177.94 -2.46 -2.38 -2.30 789.30
2013 100.00 25974.17 1213.46 -2.42 -2.34 -2.27 647.73
2014 100.00 30025.12 948.17 -2.24 -2.16 -2.09 245.86
2015 100.00 18362.51 792.09 -2.54 -2.44 -2.34 1088.61